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Abstract. The current distribution of language size in terms of speaker population 
is generally described using a lognormal distribution. Analyzing the original real data 
we show how the double-Pareto lognormal distribution can give an alternative fit 
that indicates the existence of a power law tail. A simple Monte Carlo model is 
constructed based on the processes of competition and fragmentation. The results 
reproduce the power law tails of the real distribution well and give better results for a 
poorly connected topology of interactions. 
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The astonishing similarity between biological and language evolution has attracted 
the interest of researchers familiar with analyze genetic properties in biological 
populations with the aim of describing problems of linguistics [HEllS]. Their techniques, 
for instance, have succeeded in explaining some interesting features related to the 
coexistence of the about 7, 000 languages present on Earth. 

More recently, the effort to connect evolutionary biology with linguistics has 
emerged into a study that describes the effects of competition between languages on 
language evolution. The work of Abrams and Strogatz in 2003 [4], who analyzed 
the stability of a system composed of two competing languages, can be considered 
as the starting point of this new research line. In the following years, other groups, 
simultaneously, developed new analytical and computational models [3 [6l [71 [H [9l [TOl 
[TTl [12]. An overview of the fast increasing literature on language competition can be 
found in refs. [131 Ej- 

Languages are by no way static. They continuously evolve, changing, for example, 
their lexicon, phonetic, and grammatical structure. This evolution is similar to the 
evolution of species driven by mutations and natural selection [15]. Following the 
common picture of biology [16], changes in the language structure may be seen as the 
result of microscopic stochastic changes caused by mutations. Natural selection, which 
may be caused by competition between individuals, positively selects some of these 
small changes, depending on their reproductive success. A sequence of macroscopic 
observations corresponds to such a microscopic picture. In language evolution, these 
macroscopic events are, for instance, the origination of two languages from an ancestor 
one - for example, the emergence of the Romance languages from Latin - or the extinction 
of a language. 

In this work we model the evolution of languages from a macroscopic point of 
view. More precisely, the microscopic processes responsible for the differentiation of 
one language into two new languages are not implemented here. Effectively, we neglect 
the microdynamics that generates language changes, at the level of individuals, and 
we just describe their effect on extinction and differentiation at the level of languages, 
throughout a phenomenological mechanism of growth and fragmentation. Language 
change is determined by the dynamics of the size of its population. The fact that rare 
languages are less attractive for people to both learn and use is the mechanism considered 
as the origin of these population size changes. Consequently, this mechanism introduces 
a sort of frequency dependent reproductive success for different languages. Statistical 
data supporting this conjecture can be found in Ref. [17]. Languages documented as 
declining are negatively correlated with population size. This phenomenon is similar 
to the AUee effect in biology [18]. With a simple computational model, based on 
the above described mechanisms, we compare simulation results with empirical data 
of the distribution of population sizes of languages (DPL) of Earth's actually spoken 
languages [19] (see Fig. [1]). 

Several attempts have recently been made to reproduce the DPL. Two works 
focused on the apparently lognormal shape of the DPL. Tuncay [20] described language 
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differentiation by means of a process of successive fragmentations, in combination with 
a multiplicative growth process. In a recent paper by Zanette |2T], the dynamics of 
language evolution is considered as a direct consequence of the demographic increase 
of the speaker populations, which is modeled by means of a simple multiplicative 
process. Unsurprisingly, these models obtain pure lognormal distributions for the DPL, 
as expected from the application of the central limit theorem for multiplied random 
variables [22l [23] . Unfortunately, the DPL is known to significantly differ from a pure 
lognormal shape |17j . 

The Schulze model [8] relies on ideas already successfully applied to model biological 
evolution. Languages are identified by a bit string that represents their characteristic 
features. New languages are produced by mutations of these features and small 
languages are discriminated by competition. These simulations, during the transient 
towards the stationary state, are able to generate data with a distribution similar to the 
DPL. A review of the Schulze model and its application to different problems connected 
to language interaction can be found in Ref. [Hj. 

Another model, the Viviane model, simulates human settlement on an unoccupied 
region. Languages suffer local mutations, until the available space becomes completely 
populated [12]. The introduction of a bit string representation into the Viviane model 
was able to generate new results which reproduce the DPL over almost the entire range 
well, except for large language sizes [211 EH] . The bit string approach gives an explanation 
for the deviation of the DPL from a lognormal distribution for small population sizes. 
The DPL changes its shape depending on the method used to distinguish different 
languages from dialects. When restricting the comparison between languages to be 
based on a small number of different features in language structure, the deviation for 
small language populations appears. This interpretation was confirmed by using a simple 
model [26] that neglects the geographic effects present in the Viviane model. 

A thorough look at the DPL may suggest that the deviations from the lognormal 
shape could be due to power law decays [27]. We investigate this idea by fitting the 
DPL with a double-Pareto lognormal distribution and additionally comparing it to the 
simulation results of our new model. 

The paper is organized as follows. First we analyze the DPL and show that the 
double-Pareto lognormal distribution [281 [29] gives an alternative fit. The next section 
introduces the computational model. In the last two sections, the simulation results are 
presented and the conclusions are given, respectively. 

1. Statistical analysis of the distribution of languages 

Reference [I9] provides the number of people speaking a given language as their own 
mother tongue. There, 6,912 different languages have been classified and for 6,142 of 
them their speaker population was estimated. In the data set of 6,142 languages we 
also verify a big difference between the median, 6000 speakers, and the average of the 
number speakers of a language, circa 1.052 x 10^ speakers. This discrepancy is related 
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Figure 1. DPL distribution and respective fits with a double-Pareto lognormal 
distribution (continuum line) and lognormal one (dotted line) . We counted the number 
of languages with speaker populations between exp(0.625n) and exp(0.625(7i + 1)) for 
n = 0, 1, 2, 3, .... Although the double-Pareto lognormal distribution shows the same 
deviation from the real data for small values of L, it presents a better adjustment of 
the whole curve when in comparison to the log- normal fit (see Dks values in the text). 
In particular, it is possible to see how the fat tail for languages with a big population 
size is well fitted by a power law. 



to the fact that only 326 languages have at least one million speakers. When they 
are assembled, these languages account for more than 95% of world's population with 
the remaining 5816 (94.6%) languages encompassing the left-over speaking population. 
This draws attention to the fat tailed behaviour of the distribution of the number of 
speakers. Starting from these data, we make a histogram by counting the number of 
languages with population size enclosed in a bin with values between exp(0.625n) and 
exp(0.626(n -|- 1)) where n = 0,1,2,.... This distribution defines the DPL. A pure 
lognormal shape can be described by. 



where A, fi and a are the parameters of the distribution. A lognormal distribution 
corresponds to a parabola in a double-logarithmic plot (see Fig. [1]). 

In order to calculate the parameter values by maximum log-likelihood estimation 
we have to transform the distribution to obtain a probability density function. As the 
size of the bins in the DPL increases exponentially, the frequency of languages with a 
certain population size is calculated by dividing the DPL by the bin width which leads 
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after further normalization in order to satisfy ^ic{L) = 1. 

The fit by maximum log-hkehhood estimation with the lognormal distribution of 
eq. ([2]) gave the parameter values /i = 8.70 and a = 3.20 with a Kolmogorov-Smirnov 
distance of D^s = 0.0253. These values of the lognormal distribution are the same as 
the ones obtained in ref. [17], where logiQ^L) was used instead of ln(L). 

Unfortunately, as stated before, the DPL of real data is known to significantly differ 
from lognormality and deviations can be easily observed for small and large values 
of L. 

In particular, a thorough look on the DPL shows that it does not seem to decay 
exponentially. Hence, we have tried to fit the data using a distribution that can account 
for such a power law decay. As both tails could show power laws, we assumed the 
so-called double Pareto lognormal distribution, introduced by Reeds [28| [29]. 
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In this case, using the maximum log-likelihood procedure, we have estimated the 
following values for the function parameters: a = 0.682, j3 = 0.603, r = 2.33, and 
u = 8.89. This set of parameters yields Dks = 0.0198 which is slightly smaller than the 
distance presented by the log-normal fit. With these values we can compute the critical 
P value, Pc, which is given by 
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The closer P^ is to zero, the poorer the fit is. With the values presented previously 
we have Pc — 1.6 x 10"^ for the fit for Eq. ([3]) and Pc* ~ 8 x lO"'^ for the log- 
normal adjustment. This fact suggests that the double Pareto distribution allows a 
better description of real data. Figure [1] shows the results of this analysis comparing 
the corresponding DPLs. 

It is likely to obtain better results by using a fitting function with a larger number 
of adjustable parameters. The Akaike information criterion [30] provides an estimation 
whether the introduction of new parameters into the fitting procedure is useful, 

D c c 

AIC = 2k + Nhg^ . (5) 

k denotes the number of fitting parameters and RSS is the residual sum of squares. 
Smaller values of this criterion correspond to better results. We obtain AIC = —10.25 
for the fitting with the lognormal function and AIC = —12.08 for the one with the 
double-Pareto lognormal function. 
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2. Interaction versus fragmentation 

This section introduces our model. As we stated before, we describe the behavior of 
languages on the macroscopic scale of sub-populations, neglecting the languages internal 
structure, constituted by the speaking preferences of each individual. Each language i 
is characterized exclusively by the number of its speakers Lj. The origination of one 
new language is obtained throughout fragmentation. Fragmentation is implemented as 
follows: at each time step, each language can break into two new languages with a fixed 
probability F: 

U{t) Li{t + 1) = ]^U{t) , L„e»(t + 1) = ]^U{t) . (6) 

For the sake of simplicity, the two new languages contain exactly half of the population 
of the ancestor language (fragmentations into two parts of unequal size do not alter the 
qualitative shape of the distribution of the fragments size). Taking into account only 
the fragmentation process, the number of languages, A'^^,, increases until each language 
has only one speaker. For large numbers of Nl, there is an interval, during the time 
evolution, when the population distribution displays a pure lognormal shape. 

It is interesting to remember that a simple soluble model analog to this 
fragmentation process is the discrete sequential fragmentation of a segment. In this case, 
through a rate equation approach, it is possible to show that an explicit asymptotic 
solution is given by a lognormal distribution. This is independent on the number of 
pieces of each breaking event [31] . 

The interaction between languages is implemented by a term that controls the 
growth of each language in dependence of its relative population size. At each time 
step, the population of each language i follows the rule: 

f^.W + TT^TT E forL,(t + l) 

y language removed otherwise 
where denotes the number of speakers of language i at time step t, / the interaction 

strength and NTot{t) = X] Li(t) is the total number of speakers. The second term, 

i=0 

which can be positive or negative, enhances the reproductive success of the languages 
with more speakers, and causes the decrease of the number of rare language speakers. 
The third term, which is controlled by the parameter V, is a. cause of random death 
that avoids the unlimited growth of the population. The term limits the growth of each 
language to have less than V speakers. It is important to notice that this third term is 
necessary because the second one causes an uncontrolled growth in the total population. 
In fact, the second term does not conserve the total population because languages with 
less than one speaker {Li{t) < 1) are removed from the simulation. 

We perform the simulations over two different topologies. In the first, the languages 
are ordered in a chain, with periodic boundary conditions, and interact with their 
nearest neighbors (NN). New languages, generated by fragmentation, are placed between 
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Figure 2. Local interactions, (a): Averaged total population < Ntoi > for different 
values of V and fixed / = 1,000,000 and F = 0.1. For large V (we took the values 
V = 10, 100, 1000, 10^), the total population saturates, (b): Rescaled population 
distribution for different values of V (same symbols and same parameter values as in 
(a). The distributions can be collapsed for large values of V. Inset: The same but 
unsealed population distributions. 



the ancestor language and the following one. This implementation describes restricted 
local interactions, that can only happen between neighbor languages, in a 1-dimensional 
space. 

In the second implementation, the interaction is carried out between all languages, 
in a mean field like model. For each iteration, every single language interacts with the 
whole set of languages. This model corresponds to the ideal case of a fully connected 
world, without geographic constraints. 
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3. Simulation results 

All our simulations runs begin with one language having a number of speakers that can 
vary from 1, 000 to 1, 000, 000. However, the final state of the system does not depend 
on the initial conditions. The system is characterized by three global quantities: the 
number of languages A''^,, the total population Nxot and the distribution of language 
sizes in terms of speaker population. This distribution is averaged over time steps. The 
results are collected after the stationary state is reached, i.e. with both Nl and Nxot 
fluctuating around some fixed value. Note, that eq. ([7]) allows the case Ntoi > V ior 
Nl>2. 

Local interactions 

We start with languages arranged on a 1-dimensional array. This setup can be 
interpreted as the most simple implementation to describe small range interactions 
corresponding to a weakly connected world. 

First, we study the model dependence on the parameter V. Without the interaction 
term, this parameter would directly determine the population size, and it is normally 
called carrying capacity or Verhulst factor in models of population dynamics. The 
presence of the second term in Eq. ([7]) generates a different behavior. As it can be 
observed in Fig. [2^, the averaged total population size saturates for large V values. 
In contrast, after a short transient, for sufficiently large values of V, the number of 
languages, N^, increases linearly with V, as shown in Fig. [3^. 

Looking at Fig. [2)d, we can see how, for large V values, it is possible to rescale all 
the DPL towards a common scaling behavior. The V dependence of the distributions 
becomes clear when looking at the inset of the same figure. In fact, for small V one 
language completely dominates the system. By increasing V, the interaction term 
becomes weaker, and the fragmentation process takes over, leading to a larger number of 
languages. For even larger V values, the distribution moves towards smaller languages 
sizes maintaining its shape when displayed in a double-logarithmic scale as shown in 
Fig. Eb. 

In Fig. [3]d, we show the power law dependence of the total population size and 
of the number of languages with the parameter F, which controls the fragmentation 
process. 

Figure shows the shape of the DPL for different values of the parameter V and 
of the interaction strength /. It is possible to obtain a data collapse by rescaling the 
distribution with the value of V and the population size with the factor V/I. It is 
interesting to note that, in log-scale, the broadness of the distribution does not depend 
on V or /. As can be clearly observed, the shape of the curve deviates strongly from a 
lognormal shape for small and large population values. In fact, we collected data over 
sufficiently many decades to be able to show that the curves decay like a power law for 
small and large population sizes. 
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Figure 3. Local interactions, (a): tlic normalized number of languages N^/V plotted 
versus simulation time steps. These data indicate that increases linearly with V . 
Parameter values are / = 10®, F = 0.1 and different V values between 10000 and 10^. 
(b): Averaged total population size < Ntoi > and the normalized averaged number 
of languages < Nl > /V versus the fragmentation probability F {I = 10®). 



Figure Hb shows the shape of the distribution for different values of all parameters 
of the model. The fragmentation probability F changes the width of the distributions. 
For smaller F we obtain broader distributions. From the analysis of this data we obtain 
scaling relations for the maximum value of the distribution (scaling with VF'^'^ / 1) and 
for its normalization {VF^''^). This last scaling relation follows exactly from the V and 
F dependence on the averaged number of languages, N^., shown in Fig. ^p. 

Using these scaling relations we can estimate the parameters values for running a 
simulation that can reproduce the DPL of real data. In fact, these parameters corre- 
spond to the ones that allow the rescaling of the DPL of real data to collapse with the 
ones shown in FigHb- Taking F = 2 - 10^^ from adjusting the broadness of the distribu- 
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Figure 4. Local interactions, (a); Histogram collapse for different / and V. 
Apparently, the shape is not lognormal but a combination of two power laws. The 
parameter ranges for the 17 simulations are / = 10'^ — 10* and V = lO"' — 10^. F is 
fixed and equals 0.1. (b): Rescaled histogram for different values of the fragmentation 
probability F. The F-scaling in the horizontal axis is introduced in order to collapse 
the position of the maxima for different simulations. The parameter ranges are 
/ = 103 - 10^ different values of ^ and = 2 • 10^^ _ g.l. The smaller F, the 
broader is the distribution. 

tion, the values of the others parameters are J = 1.5 ■ 10^, V = b ■ 10^^. Unfortunately, 
these values correspond to simulations with too demanding computational time. For 
this reason, we are forced to just test the quality of the collapse of rescaled real data 
and the rescaled simulation, and we can not do a direct comparison. We carry out a 
simulation with the parameters F = 2 ■ 10~^, / = 10^ and V = 3.2 • 10^^. In Fig. [5l 
we can see how the DPL collapses well with our simulation if we neglect very small 
population sizes (L < 100). In addition, our model can reproduce the behavior of the 
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Figure 5. Local interactions. The real data (circles) is rescaled with the values 
/ = 1.5 • 10'^, V = 5 ■ 10^^ and F ~ 2 ■ 10^^\ These parameters correspond to the one 
that allow rescaling of real data with the ones shown in Fig [H The continuous line 
represents the rescaled simulation results. The parameters used in the simulation are: 
/ = 10^, 1/ = 3.2 • 10^2 and F = 2 • 10"^ 

fat tail well (which corresponds to frequently spoken languages). 



Mean field 

The increasing connection between people speaking different mother languages 
suggests exploring the behaviour of our model located on a topology with more links 
than those of a 1-dimensional array. For this reason, we decided to explore the other 
limiting situation: a fully connected model (mean field like description). Even if this is 
a quite unrealistic implementation, we would like to estimate the upper bound for our 
dynamics, corresponding to a fully globalized world. 

Figure [6] shows the normalized total population, Ntoi/V, and the normalized 
language number, Nl/V, for different V values. In this topology, both quantities 
approximately grow linearly with V and Ntoi does not reach any saturation value. 
This novel behavior is due to the different form of carrying out the sum in the second 
term of Eq. [3 In fact, in this implementation, the sum is taken over all languages and 
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Figure 6. Mean field interactions. The normafized total population (a) and the 
normalized number of languages (b) versus simulation time steps. We fixed / = 100 
and F = 0.001 and varied V from 10^ to 32 • 10^. The figures show that both quantities 
approximately increase linearly with V . For smaller V , the stationary state is reached 
faster. 



not only over the two neighbors, making the sum to have values of the order of Ntot- 

For large values oi V , the distribution approaches a fixed lognormal-like shape. 
In Fig [3, we show the data collapse for different values of the parameters V , I and 
F . Surprisingly, the scaling relations are really simple and all DPLs can be perfectly 
collapsed into each other. On the logarithmic scale, the width of the distribution 
remains fixed and it does not change when varying the parameters values. Even the in- 
dependence, leading to a different broadness of the DPL in the previous implementation, 
disappears. The distribution clearly shows two power law decays with exponents 1 and 
-2. 

These results are quite intriguing. On the one hand, the fixed value of the 
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Figure 7. Mean field interactions. Data collapse for the distribution of language size in 
terms of speaker population. Sufficiently large total populations arc considered (large V 
values). Also for this implementation, the distributions show fat tails, with exponents 
1 and -2. The parameter ranges are within / = 10 - 100, 000, V = 10, 000 - 10^ and 
F = 0.001 - 0.1. 
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distribution broadness does not allow the reproduction of the real data of actual language 
sizes. On the other hand, it suggests an interesting conjecture. If the increasing 
connection between people, in the future, will allow the description of their interaction 
using a mean field like approximation, our model may provide some predictions. 
Independently on the population growth, the width of the DPL distribution will get 
narrower. This means that, despite of the probable growth of the world population, 
more languages will become extinct than new languages appear. The number of less 
spoken languages will strongly decrease. 

4. Conclusions 

A careful analysis of the distribution of population sizes of languages (DPL) suggests 
that the behavior of its tail is better described by a power law. For this reason, a fit of 
the real data using the so-called double-Pareto lognormal distribution generates better 
results than the fit with a pure lognormal distribution. 

These deviations from a pure lognormal distribution mean that simple models that 
reproduce exactly such distributions must be improved. We implemented a new toy 
model on a 1-dimensional topology that, starting from a macroscopic description at 
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the sub-population level, accounts for two general mechanisms: language competition 
and fragmentation. These ingredients are sufficient to generate distributions that well 
reproduce real data, particularly the behavior of frequently spoken languages (the right 
tail of the distribution). 

Moreover, we studied our model on a fully connected network, in which all sub- 
populations are in mutual contact (mean field like behavior). This implementation 
can be a good approximation for describing the trend for the interaction patterns on a 
future, characterized by more and more interconnected communities. This model version 
allows a simple prediction that confirms previous conjectures about the future mass 
extinction of languages [I7j: despite of the probable growth of the world's population, 
more languages will become extinct than new languages will appear. 

Finally, we want to recall that, as our toy model is based on simple and general 
rules, it may be characteristic for systems other than language evolution as well. In 
fact, systems with an interplay between a fragmentation process and a size dependent 
growth should exhibit similar patterns. A good example comes from the description 
of the growth of companies [32l [33l [M] or from the analysis of the mutual fund size 
distribution [35]. In these systems it is possible to describe the size distributions in terms 
of log- normal-like distributions In fact, as a zeroth order approximation, log-normal 
distributions can naturally be generated by a multiplicative growth processes, in which 
the company size, at any given time, is given as a multiplicative factor times the size 
of the company at a previous time. For the logarithm of the company size, this process 
becomes an additive process and so the distribution converges to a normal distribution, 
obeying the central limit theorem (Gibrat's law [36] and theory of breakage [37]). Small 
variations in this approach, as the introduction of some type of cutoff, can modify 
the lognormal distribution into a heavy tailed distribution [22]. This is obtained, for 
example, by introducing creation and annihilation processes [38| [39] . At this level of 
description, none of the components of the random process depend on size. Anyway, it is 
well known that the growth rates of companies are size dependent [32l B0| E] and once 
size effects are taken into account the predictions for the distribution can become more 
quantitative [35]. This description of the growth of companies is similar to our approach, 
in the sense that the dynamics of creation and annihilation in growing phenomena 
can be analogously described introducing the process of fragmentation as well as its 
counterpart, the phenomenon of coalescence. Our work points out the importance of 
taking into account these mechanisms, in modelling systems usually described in terms 
of birth and death processes or random growth [l2] . 
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